background-image: url(https://raw.githubusercontent.com/tidyverse/purrr/master/man/figures/logo.png) background-position: 95% 5% background-size: 7.5% layout: true
##purrr: Functional Programming Tools
purrr facilitates functional programming (FP) with data frame objects (e.g., tibbles) in R. Whenever you would normally refer to a for-loop for solving an iterative problem, the family of map_*() functions allows you to rephrase your problem as a data manipulation pipeline.
Three types of map_*() function: - map(.x, .f, ...) takes the input .x and applies .f to each element in .x. - group_map(.data, .f, ...) takes a grouped tibble and applies .f to each subgroup. - map2(.x, .y, .f, ...) takes the inputs .x and .y and applies .f to .x and .y in parallel. - pmap(.l, .f, ...) takes a list .l of inputs and applies .f to each element in .l in parallel.
.pull-left[ By default map() returns a list. If you want to be more explicit about the output you may refer to - map_lgl() to receive an output type logical, - map_chr() to receive an output type character, - map_int() to receive an output type integer, - map_dbl() to receive an output type double, or - map_df() to receive a data frame output.] .pull-right[ The input .x to any map()_* function can be either a vector, list or data frame. - Vector: Iteration over vector entries - List: Iteration over list elements - Data frame: Iteration over columns]
??? Comments
##purrr: Functional Programming Tools
Use Case: Apply the z-normalization to multiple variables
.x as input.z_transform <- function(.x) {
mean <- mean(.x)
sd <- sd(.x)
return( (.x - mean) / sd )
}
rnorm(n, mean, sd) and store them as double vectors in a list.samples <- list(sample1 = rnorm(10, 75, 22), sample2 = rnorm(10, 52, 11), sample3 = rnorm(10, 99, 33))
## $sample1
## [1] 75.68782 125.66495 83.25217 90.76536 117.68463 52.62998 56.95321 78.48313
## [9] 80.83747 78.62980
##
## $sample2
## [1] 68.34106 50.80146 54.25619 49.83732 56.72566 50.77191 44.80929 47.04905 54.57193
## [10] 48.12263
##
## $sample3
## [1] 90.96179 132.67138 78.97472 90.17809 177.76370 102.07773 73.75073 89.41950
## [9] 96.61828 48.50498
??? comment
##purrr: Functional Programming Tools
Use Case: Applying the z-normalization to multiple variables
for-loop.for (s in samples) {
print(z_transform(s))
}
map().map(.x = samples, .f = ~z_transform(.x)) #equivalent to map(samples, z_transform)
## $sample1
## [1] -0.36358052 1.80708422 -0.03503686 0.29128489 1.46047378 -1.36505525 -1.17728370
## [8] -0.24217110 -0.13991465 -0.23580082
##
## $sample2
## [1] 2.3803664 -0.2600081 0.2600609 -0.4051472 0.6318090 -0.2644558 -1.1620556
## [8] -0.8248872 0.3075912 -0.6632736
##
## $sample3
## [1] -0.20242455 0.98168256 -0.54272922 -0.22467325 2.26182333 0.11314960 -0.69103477
## [8] -0.24620885 -0.04184039 -1.40774447
??? comments
##purrr: Functional Programming Tools
Use Case: Applying the z-normalization to multiple variables
map() but use an anonymous function.map(
.x = samples,
.f = function(.x) {
(.x - mean(.x, na.rm = T)) / sd(.x, na.rm = T)
})
map() but use an purrr-style function.map(
.x = samples,
.f = ~(.x - mean(.x, na.rm = T)) / sd(.x, na.rm = T))
??? comments
##purrr: Functional Programming Tools
.center[ This is great right?!?!
] – .center[
Now let us look at some other practical use cases!]
??? src: https://tenor.com/view/the-office-finger-guns-right-on-steve-carell-michael-scott-gif-4724041
##purrr: Functional Programming Tools
Check the data types of my columns:
penguins %>%
map_df(class)
## # A tibble: 1 x 8
## species island bill_length_mm bill_depth_mm flipper_length_~ body_mass_g sex year
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 charact~ charac~ numeric numeric numeric numeric chara~ nume~
Check the number of missing values per column:
penguins %>%
map_df(~sum(is.na(.)))
## # A tibble: 1 x 8
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
## <int> <int> <int> <int> <int> <int> <int> <int>
## 1 0 0 2 2 2 2 11 0
??? comments
##purrr: Functional Programming Tools
Check the number of distinct values per column:
penguins %>%
map_df(n_distinct)
## # A tibble: 1 x 8
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
## <int> <int> <int> <int> <int> <int> <int> <int>
## 1 3 3 165 81 56 95 3 3
??? comments
##purrr: Functional Programming Tools
Check the highest value in each subset of the data (e.g., largest flipper_length_mm per sex):
penguins %>%
drop_na %>%
group_by(sex) %>%
group_map(~slice_max(., flipper_length_mm, n = 1), .keep = T)
## [[1]]
## # A tibble: 1 x 8
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
## 1 Gentoo Biscoe 46.9 14.6 222 4875 female 2009
##
## [[2]]
## # A tibble: 1 x 8
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
## 1 Gentoo Biscoe 54.3 15.7 231 5650 male 2008
??? drop_na: because otherwise I would also have a subgroup of NA
##purrr: Functional Programming Tools
map() also comes in handy, if you like to produce a series of identical plots, each depicting a separate subset of the underlying data:
species <- penguins %>% distinct(species, year) %>% pull(species) #.x argument for map()
years <- penguins %>% distinct(species, year) %>% pull(year) #.y argument for map()
penguin_plots <- map2(
.x = species,
.y = years,
.f = ~{
penguins %>%
drop_na %>%
filter(species == .x, year == .y) %>%
ggplot() +
geom_point(aes(x = bill_length_mm, y = body_mass_g)) +
labs(title = glue::glue("Scatter Plot Bill Length vs. BMI ({.x}, {.y})"))
})
??? comments
##purrr: Functional Programming Tools
.pull-left[
penguin_plots[[1]]
] .pull-right[
penguin_plots[[4]]
]
??? comments
##purrr: Functional Programming Tools
Finally, map() is really powerful in the context of modelling. In the following we fit a linear regression model for each species-island subset.
tibble of data for each species-island combination.nested_penguins <- penguins %>%
drop_na %>%
group_by(species, island) %>%
nest
## # A tibble: 5 x 3
## # Groups: species, island [5]
## species island data
## <chr> <chr> <list>
## 1 Adelie Torgersen <tibble [47 x 6]>
## 2 Adelie Biscoe <tibble [44 x 6]>
## 3 Adelie Dream <tibble [55 x 6]>
## 4 Gentoo Biscoe <tibble [119 x 6]>
## 5 Chinstrap Dream <tibble [68 x 6]>
.pull-right[ .footnote[ Note: For accessing elements in a nested tibble you may use the pluck(). For example, for accessing the first tibble in the column data, you may run nested_penguins %>% pluck("data", 1).]]
??? comments
##purrr: Functional Programming Tools
body_mass_g is regressed (~) on all other variables (denoted by a dot in the lm() formula).nested_penguins <- nested_penguins %>%
mutate(lin_reg = map(.x = data, .f = ~lm(body_mass_g ~ ., data = .x)))
## # A tibble: 5 x 4
## # Groups: species, island [5]
## species island data lin_reg
## <chr> <chr> <list> <list>
## 1 Adelie Torgersen <tibble [47 x 6]> <lm>
## 2 Adelie Biscoe <tibble [44 x 6]> <lm>
## 3 Adelie Dream <tibble [55 x 6]> <lm>
## 4 Gentoo Biscoe <tibble [119 x 6]> <lm>
## 5 Chinstrap Dream <tibble [68 x 6]> <lm>
??? comments
##purrr: Functional Programming Tools
summary() and extract the model coefficients as a tibble. Finally, use the unnest() function to receive a tidy data frame.nested_penguins %>%
mutate(coefs = map(lin_reg, ~summary(.x) %>% .$coefficients %>% as_tibble)) %>%
unnest(coefs)
## # A tibble: 30 x 8
## # Groups: species, island [5]
## species island data lin_reg Estimate `Std. Error` `t value` `Pr(>|t|)`
## <chr> <chr> <list> <list> <dbl> <dbl> <dbl> <dbl>
## 1 Adelie Torgersen <tibble [47 x 6]> <lm> 449264. 130401. 3.45 0.00133
## 2 Adelie Torgersen <tibble [47 x 6]> <lm> 4.20 17.3 0.243 0.809
## 3 Adelie Torgersen <tibble [47 x 6]> <lm> -62.0 54.6 -1.14 0.263
## 4 Adelie Torgersen <tibble [47 x 6]> <lm> 15.5 8.74 1.77 0.0838
## # ... with 26 more rows
.footnote[ Note: You may eventually want to drop the lin_reg and data, otherwise you carry around a lot of redundant data in your tibble which may exceed your memory storage capacity very quickly.]
??? there are packages for automatically doing this with just one line of code, see broom
##purrr: Functional Programming Tools
.pull-left[ .center[ How you may probably feel right now
]]
–
.pull-right[ .center[ How you do after mastering the intricacies of FP
]]
.footnote[ .pull-left[ For a great tutorial that help you master the notion of functional programming with R see this blog post by Rebecca Barter.]]